Monthly Temperature Trends 🌡️📈¶
This project analyzes monthly temperature data from 1901 to 2017 using pandas and seaborn. It transforms wide-format climate data into a tidy format for time series visualization and seasonal trend analysis.
Files Included¶
monthly-temperature-trends.ipynb: Jupyter notebook with full analysismonthly-temperature-trends.html: Static HTML version for previewtemp.csv: Source temperature datathumbnail.png: Visual summary for README
Highlights¶
- Data reshaping using
pd.melt()for tidy time series - Monthly temperature trends across decades
- Seasonal variation visualization
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [2]:
temp = pd.read_csv("temp.csv")
In [3]:
temp.head()
Out[3]:
| Unnamed: 0 | YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1901 | 17.99 | 19.43 | 23.49 | 26.41 | 28.28 | 28.60 | 27.49 | 26.98 | 26.26 | 25.08 | 21.73 | 18.95 |
| 1 | 1 | 1902 | 19.00 | 20.39 | 24.10 | 26.54 | 28.68 | 28.44 | 27.29 | 27.05 | 25.95 | 24.37 | 21.33 | 18.78 |
| 2 | 2 | 1903 | 18.32 | 19.79 | 22.46 | 26.03 | 27.93 | 28.41 | 28.04 | 26.63 | 26.34 | 24.57 | 20.96 | 18.29 |
| 3 | 3 | 1904 | 17.77 | 19.39 | 22.95 | 26.73 | 27.83 | 27.85 | 26.84 | 26.73 | 25.84 | 24.36 | 21.07 | 18.84 |
| 4 | 4 | 1905 | 17.40 | 17.79 | 21.78 | 24.84 | 28.32 | 28.69 | 27.67 | 27.47 | 26.29 | 26.16 | 22.07 | 18.71 |
In [4]:
temp = pd.read_csv("temp.csv", index_col = 0)
In [5]:
temp.head()
Out[5]:
| YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | 17.99 | 19.43 | 23.49 | 26.41 | 28.28 | 28.60 | 27.49 | 26.98 | 26.26 | 25.08 | 21.73 | 18.95 |
| 1 | 1902 | 19.00 | 20.39 | 24.10 | 26.54 | 28.68 | 28.44 | 27.29 | 27.05 | 25.95 | 24.37 | 21.33 | 18.78 |
| 2 | 1903 | 18.32 | 19.79 | 22.46 | 26.03 | 27.93 | 28.41 | 28.04 | 26.63 | 26.34 | 24.57 | 20.96 | 18.29 |
| 3 | 1904 | 17.77 | 19.39 | 22.95 | 26.73 | 27.83 | 27.85 | 26.84 | 26.73 | 25.84 | 24.36 | 21.07 | 18.84 |
| 4 | 1905 | 17.40 | 17.79 | 21.78 | 24.84 | 28.32 | 28.69 | 27.67 | 27.47 | 26.29 | 26.16 | 22.07 | 18.71 |
In [6]:
temp.shape
Out[6]:
(117, 13)
In [7]:
# temp.T
In [8]:
# temp.transpose()
In [9]:
# pd.melt(temp,id_vars = "YEAR",value_vars = temp.columns[1:])
df = pd.melt(temp,id_vars = "YEAR")
In [10]:
df
Out[10]:
| YEAR | variable | value | |
|---|---|---|---|
| 0 | 1901 | JAN | 17.99 |
| 1 | 1902 | JAN | 19.00 |
| 2 | 1903 | JAN | 18.32 |
| 3 | 1904 | JAN | 17.77 |
| 4 | 1905 | JAN | 17.40 |
| ... | ... | ... | ... |
| 1399 | 2013 | DEC | 19.69 |
| 1400 | 2014 | DEC | 19.50 |
| 1401 | 2015 | DEC | 20.21 |
| 1402 | 2016 | DEC | 21.89 |
| 1403 | 2017 | DEC | 21.47 |
1404 rows × 3 columns
In [11]:
117*12
Out[11]:
1404
In [12]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1404 entries, 0 to 1403 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 YEAR 1404 non-null int64 1 variable 1404 non-null object 2 value 1404 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 33.0+ KB
In [13]:
date = df["variable"]+ " " + df["YEAR"].astype(str)
In [14]:
df["Date"] = pd.to_datetime(date)
C:\Users\cpaim\AppData\Local\Temp\ipykernel_14036\2086187362.py:1: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. df["Date"] = pd.to_datetime(date)
In [15]:
df
Out[15]:
| YEAR | variable | value | Date | |
|---|---|---|---|---|
| 0 | 1901 | JAN | 17.99 | 1901-01-01 |
| 1 | 1902 | JAN | 19.00 | 1902-01-01 |
| 2 | 1903 | JAN | 18.32 | 1903-01-01 |
| 3 | 1904 | JAN | 17.77 | 1904-01-01 |
| 4 | 1905 | JAN | 17.40 | 1905-01-01 |
| ... | ... | ... | ... | ... |
| 1399 | 2013 | DEC | 19.69 | 2013-12-01 |
| 1400 | 2014 | DEC | 19.50 | 2014-12-01 |
| 1401 | 2015 | DEC | 20.21 | 2015-12-01 |
| 1402 | 2016 | DEC | 21.89 | 2016-12-01 |
| 1403 | 2017 | DEC | 21.47 | 2017-12-01 |
1404 rows × 4 columns
In [16]:
df.sort_values("Date", inplace = True)
In [17]:
df
Out[17]:
| YEAR | variable | value | Date | |
|---|---|---|---|---|
| 0 | 1901 | JAN | 17.99 | 1901-01-01 |
| 117 | 1901 | FEB | 19.43 | 1901-02-01 |
| 234 | 1901 | MAR | 23.49 | 1901-03-01 |
| 351 | 1901 | APR | 26.41 | 1901-04-01 |
| 468 | 1901 | MAY | 28.28 | 1901-05-01 |
| ... | ... | ... | ... | ... |
| 935 | 2017 | AUG | 28.12 | 2017-08-01 |
| 1052 | 2017 | SEP | 28.11 | 2017-09-01 |
| 1169 | 2017 | OCT | 27.24 | 2017-10-01 |
| 1286 | 2017 | NOV | 23.92 | 2017-11-01 |
| 1403 | 2017 | DEC | 21.47 | 2017-12-01 |
1404 rows × 4 columns
In [18]:
df.reset_index(drop = True, inplace = True)
In [19]:
df.head()
Out[19]:
| YEAR | variable | value | Date | |
|---|---|---|---|---|
| 0 | 1901 | JAN | 17.99 | 1901-01-01 |
| 1 | 1901 | FEB | 19.43 | 1901-02-01 |
| 2 | 1901 | MAR | 23.49 | 1901-03-01 |
| 3 | 1901 | APR | 26.41 | 1901-04-01 |
| 4 | 1901 | MAY | 28.28 | 1901-05-01 |
In [20]:
df.columns
Out[20]:
Index(['YEAR', 'variable', 'value', 'Date'], dtype='object')
In [21]:
df.columns = ['Year', 'Month', 'Temprature', 'Date']
In [22]:
df
Out[22]:
| Year | Month | Temprature | Date | |
|---|---|---|---|---|
| 0 | 1901 | JAN | 17.99 | 1901-01-01 |
| 1 | 1901 | FEB | 19.43 | 1901-02-01 |
| 2 | 1901 | MAR | 23.49 | 1901-03-01 |
| 3 | 1901 | APR | 26.41 | 1901-04-01 |
| 4 | 1901 | MAY | 28.28 | 1901-05-01 |
| ... | ... | ... | ... | ... |
| 1399 | 2017 | AUG | 28.12 | 2017-08-01 |
| 1400 | 2017 | SEP | 28.11 | 2017-09-01 |
| 1401 | 2017 | OCT | 27.24 | 2017-10-01 |
| 1402 | 2017 | NOV | 23.92 | 2017-11-01 |
| 1403 | 2017 | DEC | 21.47 | 2017-12-01 |
1404 rows × 4 columns
In [23]:
df = df[['Date','Year', 'Month', 'Temprature' ]]
In [24]:
df
Out[24]:
| Date | Year | Month | Temprature | |
|---|---|---|---|---|
| 0 | 1901-01-01 | 1901 | JAN | 17.99 |
| 1 | 1901-02-01 | 1901 | FEB | 19.43 |
| 2 | 1901-03-01 | 1901 | MAR | 23.49 |
| 3 | 1901-04-01 | 1901 | APR | 26.41 |
| 4 | 1901-05-01 | 1901 | MAY | 28.28 |
| ... | ... | ... | ... | ... |
| 1399 | 2017-08-01 | 2017 | AUG | 28.12 |
| 1400 | 2017-09-01 | 2017 | SEP | 28.11 |
| 1401 | 2017-10-01 | 2017 | OCT | 27.24 |
| 1402 | 2017-11-01 | 2017 | NOV | 23.92 |
| 1403 | 2017-12-01 | 2017 | DEC | 21.47 |
1404 rows × 4 columns
In [25]:
# plotly
In [26]:
# !pip install plotly
In [27]:
mx = df["Temprature"].max()
mx
Out[27]:
30.78
In [28]:
import plotly.express as px
import plotly.graph_objects as go
In [29]:
# fig = plt.figure(figsize = (12,5))
fig = go.Figure(layout = go.Layout(yaxis = {"range":[0, mx+1]}))
In [30]:
fig = px.box(df, "Month", "Temprature")
fig.update_layout(title = "Montly Temprature over years")
fig.show()
In [31]:
# plt.subplot(3,4,1)
In [32]:
# plt.save_fig()
# fig.save_fig()
fig = px.line(df, "Year","Temprature", facet_col = "Month", facet_col_wrap = 4)
fig.show()
In [33]:
winter = ["JAN", "FEB", "DEC"]
summer = ["MAR", "APR","MAY"]
monsoon = ["JUN","JUL","AUG","SEP"]
autumn = ["OCT","NOV"]
In [34]:
temp["Winter"] = temp[winter].mean(axis = 1)
temp["Summer"] = temp[summer].mean(axis = 1)
temp["Monsoon"] = temp[monsoon].mean(axis = 1)
temp["Autumn"] = temp[autumn].mean(axis = 1)
In [35]:
temp
Out[35]:
| YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | Winter | Summer | Monsoon | Autumn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | 17.99 | 19.43 | 23.49 | 26.41 | 28.28 | 28.60 | 27.49 | 26.98 | 26.26 | 25.08 | 21.73 | 18.95 | 18.790000 | 26.060000 | 27.3325 | 23.405 |
| 1 | 1902 | 19.00 | 20.39 | 24.10 | 26.54 | 28.68 | 28.44 | 27.29 | 27.05 | 25.95 | 24.37 | 21.33 | 18.78 | 19.390000 | 26.440000 | 27.1825 | 22.850 |
| 2 | 1903 | 18.32 | 19.79 | 22.46 | 26.03 | 27.93 | 28.41 | 28.04 | 26.63 | 26.34 | 24.57 | 20.96 | 18.29 | 18.800000 | 25.473333 | 27.3550 | 22.765 |
| 3 | 1904 | 17.77 | 19.39 | 22.95 | 26.73 | 27.83 | 27.85 | 26.84 | 26.73 | 25.84 | 24.36 | 21.07 | 18.84 | 18.666667 | 25.836667 | 26.8150 | 22.715 |
| 4 | 1905 | 17.40 | 17.79 | 21.78 | 24.84 | 28.32 | 28.69 | 27.67 | 27.47 | 26.29 | 26.16 | 22.07 | 18.71 | 17.966667 | 24.980000 | 27.5300 | 24.115 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 112 | 2013 | 18.88 | 21.07 | 24.53 | 26.97 | 29.06 | 28.24 | 27.50 | 27.22 | 26.87 | 25.63 | 22.18 | 19.69 | 19.880000 | 26.853333 | 27.4575 | 23.905 |
| 113 | 2014 | 18.81 | 20.35 | 23.34 | 26.91 | 28.45 | 29.42 | 28.07 | 27.42 | 26.61 | 25.38 | 22.53 | 19.50 | 19.553333 | 26.233333 | 27.8800 | 23.955 |
| 114 | 2015 | 19.02 | 21.23 | 23.52 | 26.52 | 28.82 | 28.15 | 28.03 | 27.64 | 27.04 | 25.82 | 22.95 | 20.21 | 20.153333 | 26.286667 | 27.7150 | 24.385 |
| 115 | 2016 | 20.92 | 23.58 | 26.61 | 29.56 | 30.41 | 29.70 | 28.18 | 28.17 | 27.72 | 26.81 | 23.90 | 21.89 | 22.130000 | 28.860000 | 28.4425 | 25.355 |
| 116 | 2017 | 20.59 | 23.08 | 25.58 | 29.17 | 30.47 | 29.44 | 28.31 | 28.12 | 28.11 | 27.24 | 23.92 | 21.47 | 21.713333 | 28.406667 | 28.4950 | 25.580 |
117 rows × 17 columns
In [36]:
temp.columns
Out[36]:
Index(['YEAR', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP',
'OCT', 'NOV', 'DEC', 'Winter', 'Summer', 'Monsoon', 'Autumn'],
dtype='object')
In [37]:
season = temp[['YEAR','Winter', 'Summer', 'Monsoon', 'Autumn']]
In [38]:
season
Out[38]:
| YEAR | Winter | Summer | Monsoon | Autumn | |
|---|---|---|---|---|---|
| 0 | 1901 | 18.790000 | 26.060000 | 27.3325 | 23.405 |
| 1 | 1902 | 19.390000 | 26.440000 | 27.1825 | 22.850 |
| 2 | 1903 | 18.800000 | 25.473333 | 27.3550 | 22.765 |
| 3 | 1904 | 18.666667 | 25.836667 | 26.8150 | 22.715 |
| 4 | 1905 | 17.966667 | 24.980000 | 27.5300 | 24.115 |
| ... | ... | ... | ... | ... | ... |
| 112 | 2013 | 19.880000 | 26.853333 | 27.4575 | 23.905 |
| 113 | 2014 | 19.553333 | 26.233333 | 27.8800 | 23.955 |
| 114 | 2015 | 20.153333 | 26.286667 | 27.7150 | 24.385 |
| 115 | 2016 | 22.130000 | 28.860000 | 28.4425 | 25.355 |
| 116 | 2017 | 21.713333 | 28.406667 | 28.4950 | 25.580 |
117 rows × 5 columns
In [39]:
season = pd.melt(season, id_vars = "YEAR")
In [40]:
season
Out[40]:
| YEAR | variable | value | |
|---|---|---|---|
| 0 | 1901 | Winter | 18.790000 |
| 1 | 1902 | Winter | 19.390000 |
| 2 | 1903 | Winter | 18.800000 |
| 3 | 1904 | Winter | 18.666667 |
| 4 | 1905 | Winter | 17.966667 |
| ... | ... | ... | ... |
| 463 | 2013 | Autumn | 23.905000 |
| 464 | 2014 | Autumn | 23.955000 |
| 465 | 2015 | Autumn | 24.385000 |
| 466 | 2016 | Autumn | 25.355000 |
| 467 | 2017 | Autumn | 25.580000 |
468 rows × 3 columns
In [41]:
season.columns
Out[41]:
Index(['YEAR', 'variable', 'value'], dtype='object')
In [42]:
season.columns = ['Year', 'Season', 'Avg_Temp']
In [43]:
season
Out[43]:
| Year | Season | Avg_Temp | |
|---|---|---|---|
| 0 | 1901 | Winter | 18.790000 |
| 1 | 1902 | Winter | 19.390000 |
| 2 | 1903 | Winter | 18.800000 |
| 3 | 1904 | Winter | 18.666667 |
| 4 | 1905 | Winter | 17.966667 |
| ... | ... | ... | ... |
| 463 | 2013 | Autumn | 23.905000 |
| 464 | 2014 | Autumn | 23.955000 |
| 465 | 2015 | Autumn | 24.385000 |
| 466 | 2016 | Autumn | 25.355000 |
| 467 | 2017 | Autumn | 25.580000 |
468 rows × 3 columns
In [44]:
fig = px.scatter(season, "Year", "Avg_Temp", facet_col = "Season", facet_col_wrap = 2,
trendline = "ols")
fig.show()
In [45]:
df.head()
Out[45]:
| Date | Year | Month | Temprature | |
|---|---|---|---|---|
| 0 | 1901-01-01 | 1901 | JAN | 17.99 |
| 1 | 1901-02-01 | 1901 | FEB | 19.43 |
| 2 | 1901-03-01 | 1901 | MAR | 23.49 |
| 3 | 1901-04-01 | 1901 | APR | 26.41 |
| 4 | 1901-05-01 | 1901 | MAY | 28.28 |
In [46]:
px.scatter(df,"Month", "Temprature", animation_frame = "Year", size = "Temprature")
In [47]:
temp
Out[47]:
| YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | Winter | Summer | Monsoon | Autumn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | 17.99 | 19.43 | 23.49 | 26.41 | 28.28 | 28.60 | 27.49 | 26.98 | 26.26 | 25.08 | 21.73 | 18.95 | 18.790000 | 26.060000 | 27.3325 | 23.405 |
| 1 | 1902 | 19.00 | 20.39 | 24.10 | 26.54 | 28.68 | 28.44 | 27.29 | 27.05 | 25.95 | 24.37 | 21.33 | 18.78 | 19.390000 | 26.440000 | 27.1825 | 22.850 |
| 2 | 1903 | 18.32 | 19.79 | 22.46 | 26.03 | 27.93 | 28.41 | 28.04 | 26.63 | 26.34 | 24.57 | 20.96 | 18.29 | 18.800000 | 25.473333 | 27.3550 | 22.765 |
| 3 | 1904 | 17.77 | 19.39 | 22.95 | 26.73 | 27.83 | 27.85 | 26.84 | 26.73 | 25.84 | 24.36 | 21.07 | 18.84 | 18.666667 | 25.836667 | 26.8150 | 22.715 |
| 4 | 1905 | 17.40 | 17.79 | 21.78 | 24.84 | 28.32 | 28.69 | 27.67 | 27.47 | 26.29 | 26.16 | 22.07 | 18.71 | 17.966667 | 24.980000 | 27.5300 | 24.115 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 112 | 2013 | 18.88 | 21.07 | 24.53 | 26.97 | 29.06 | 28.24 | 27.50 | 27.22 | 26.87 | 25.63 | 22.18 | 19.69 | 19.880000 | 26.853333 | 27.4575 | 23.905 |
| 113 | 2014 | 18.81 | 20.35 | 23.34 | 26.91 | 28.45 | 29.42 | 28.07 | 27.42 | 26.61 | 25.38 | 22.53 | 19.50 | 19.553333 | 26.233333 | 27.8800 | 23.955 |
| 114 | 2015 | 19.02 | 21.23 | 23.52 | 26.52 | 28.82 | 28.15 | 28.03 | 27.64 | 27.04 | 25.82 | 22.95 | 20.21 | 20.153333 | 26.286667 | 27.7150 | 24.385 |
| 115 | 2016 | 20.92 | 23.58 | 26.61 | 29.56 | 30.41 | 29.70 | 28.18 | 28.17 | 27.72 | 26.81 | 23.90 | 21.89 | 22.130000 | 28.860000 | 28.4425 | 25.355 |
| 116 | 2017 | 20.59 | 23.08 | 25.58 | 29.17 | 30.47 | 29.44 | 28.31 | 28.12 | 28.11 | 27.24 | 23.92 | 21.47 | 21.713333 | 28.406667 | 28.4950 | 25.580 |
117 rows × 17 columns
In [48]:
temp.columns
Out[48]:
Index(['YEAR', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP',
'OCT', 'NOV', 'DEC', 'Winter', 'Summer', 'Monsoon', 'Autumn'],
dtype='object')
In [49]:
tm = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP','OCT', 'NOV', 'DEC']
In [50]:
temp["Yearly_Mean"] = temp[tm].mean(axis = 1)
In [51]:
temp
Out[51]:
| YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | Winter | Summer | Monsoon | Autumn | Yearly_Mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | 17.99 | 19.43 | 23.49 | 26.41 | 28.28 | 28.60 | 27.49 | 26.98 | 26.26 | 25.08 | 21.73 | 18.95 | 18.790000 | 26.060000 | 27.3325 | 23.405 | 24.224167 |
| 1 | 1902 | 19.00 | 20.39 | 24.10 | 26.54 | 28.68 | 28.44 | 27.29 | 27.05 | 25.95 | 24.37 | 21.33 | 18.78 | 19.390000 | 26.440000 | 27.1825 | 22.850 | 24.326667 |
| 2 | 1903 | 18.32 | 19.79 | 22.46 | 26.03 | 27.93 | 28.41 | 28.04 | 26.63 | 26.34 | 24.57 | 20.96 | 18.29 | 18.800000 | 25.473333 | 27.3550 | 22.765 | 23.980833 |
| 3 | 1904 | 17.77 | 19.39 | 22.95 | 26.73 | 27.83 | 27.85 | 26.84 | 26.73 | 25.84 | 24.36 | 21.07 | 18.84 | 18.666667 | 25.836667 | 26.8150 | 22.715 | 23.850000 |
| 4 | 1905 | 17.40 | 17.79 | 21.78 | 24.84 | 28.32 | 28.69 | 27.67 | 27.47 | 26.29 | 26.16 | 22.07 | 18.71 | 17.966667 | 24.980000 | 27.5300 | 24.115 | 23.932500 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 112 | 2013 | 18.88 | 21.07 | 24.53 | 26.97 | 29.06 | 28.24 | 27.50 | 27.22 | 26.87 | 25.63 | 22.18 | 19.69 | 19.880000 | 26.853333 | 27.4575 | 23.905 | 24.820000 |
| 113 | 2014 | 18.81 | 20.35 | 23.34 | 26.91 | 28.45 | 29.42 | 28.07 | 27.42 | 26.61 | 25.38 | 22.53 | 19.50 | 19.553333 | 26.233333 | 27.8800 | 23.955 | 24.732500 |
| 114 | 2015 | 19.02 | 21.23 | 23.52 | 26.52 | 28.82 | 28.15 | 28.03 | 27.64 | 27.04 | 25.82 | 22.95 | 20.21 | 20.153333 | 26.286667 | 27.7150 | 24.385 | 24.912500 |
| 115 | 2016 | 20.92 | 23.58 | 26.61 | 29.56 | 30.41 | 29.70 | 28.18 | 28.17 | 27.72 | 26.81 | 23.90 | 21.89 | 22.130000 | 28.860000 | 28.4425 | 25.355 | 26.454167 |
| 116 | 2017 | 20.59 | 23.08 | 25.58 | 29.17 | 30.47 | 29.44 | 28.31 | 28.12 | 28.11 | 27.24 | 23.92 | 21.47 | 21.713333 | 28.406667 | 28.4950 | 25.580 | 26.291667 |
117 rows × 18 columns
In [52]:
fig = go.Figure(data = [go.Scatter(x = temp["YEAR"], y = temp["Yearly_Mean"],mode = "lines"),
go.Scatter(x = temp["YEAR"], y = temp["Yearly_Mean"],mode = "markers")])
fig.update_layout(xaxis_title = "Year", yaxis_title = "Mean Temp", title = "XYZ")
fig.show()
In [53]:
df
Out[53]:
| Date | Year | Month | Temprature | |
|---|---|---|---|---|
| 0 | 1901-01-01 | 1901 | JAN | 17.99 |
| 1 | 1901-02-01 | 1901 | FEB | 19.43 |
| 2 | 1901-03-01 | 1901 | MAR | 23.49 |
| 3 | 1901-04-01 | 1901 | APR | 26.41 |
| 4 | 1901-05-01 | 1901 | MAY | 28.28 |
| ... | ... | ... | ... | ... |
| 1399 | 2017-08-01 | 2017 | AUG | 28.12 |
| 1400 | 2017-09-01 | 2017 | SEP | 28.11 |
| 1401 | 2017-10-01 | 2017 | OCT | 27.24 |
| 1402 | 2017-11-01 | 2017 | NOV | 23.92 |
| 1403 | 2017-12-01 | 2017 | DEC | 21.47 |
1404 rows × 4 columns
In [54]:
df.columns
Out[54]:
Index(['Date', 'Year', 'Month', 'Temprature'], dtype='object')
In [55]:
ml = df[['Year', 'Month', 'Temprature']]
In [56]:
ml.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1404 entries, 0 to 1403 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 1404 non-null int64 1 Month 1404 non-null object 2 Temprature 1404 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 33.0+ KB
In [57]:
data = pd.get_dummies(ml)
In [58]:
data.head()
Out[58]:
| Year | Temprature | Month_APR | Month_AUG | Month_DEC | Month_FEB | Month_JAN | Month_JUL | Month_JUN | Month_MAR | Month_MAY | Month_NOV | Month_OCT | Month_SEP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | 17.99 | False | False | False | False | True | False | False | False | False | False | False | False |
| 1 | 1901 | 19.43 | False | False | False | True | False | False | False | False | False | False | False | False |
| 2 | 1901 | 23.49 | False | False | False | False | False | False | False | True | False | False | False | False |
| 3 | 1901 | 26.41 | True | False | False | False | False | False | False | False | False | False | False | False |
| 4 | 1901 | 28.28 | False | False | False | False | False | False | False | False | True | False | False | False |
In [59]:
x = data.drop("Temprature", axis = 1).copy()
y = data["Temprature"]
In [60]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
In [61]:
x_train, x_test,y_train, y_test = train_test_split(x,y, test_size =0.3)
In [62]:
print(x_train.shape, x_test.shape,y_train.shape, y_test.shape)
(982, 13) (422, 13) (982,) (422,)
In [63]:
dtr = DecisionTreeRegressor()
In [64]:
dtr.fit(x_train, y_train)
Out[64]:
DecisionTreeRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeRegressor()
In [65]:
predict = dtr.predict(x_test)
In [66]:
from sklearn.metrics import accuracy_score, r2_score
In [67]:
r2_score(predict, y_test)
Out[67]:
0.9644413279756798
In [68]:
temp["YEAR"].max()
Out[68]:
2017
In [69]:
x_test
Out[69]:
| Year | Month_APR | Month_AUG | Month_DEC | Month_FEB | Month_JAN | Month_JUL | Month_JUN | Month_MAR | Month_MAY | Month_NOV | Month_OCT | Month_SEP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1212 | 2002 | False | False | False | False | True | False | False | False | False | False | False | False |
| 527 | 1944 | False | False | True | False | False | False | False | False | False | False | False | False |
| 1333 | 2012 | False | False | False | True | False | False | False | False | False | False | False | False |
| 1353 | 2013 | False | False | False | False | False | False | False | False | False | False | True | False |
| 1291 | 2008 | False | True | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 294 | 1925 | False | False | False | False | False | True | False | False | False | False | False | False |
| 571 | 1948 | False | True | False | False | False | False | False | False | False | False | False | False |
| 39 | 1904 | True | False | False | False | False | False | False | False | False | False | False | False |
| 420 | 1936 | False | False | False | False | True | False | False | False | False | False | False | False |
| 873 | 1973 | False | False | False | False | False | False | False | False | False | False | True | False |
422 rows × 13 columns
In [70]:
df
Out[70]:
| Date | Year | Month | Temprature | |
|---|---|---|---|---|
| 0 | 1901-01-01 | 1901 | JAN | 17.99 |
| 1 | 1901-02-01 | 1901 | FEB | 19.43 |
| 2 | 1901-03-01 | 1901 | MAR | 23.49 |
| 3 | 1901-04-01 | 1901 | APR | 26.41 |
| 4 | 1901-05-01 | 1901 | MAY | 28.28 |
| ... | ... | ... | ... | ... |
| 1399 | 2017-08-01 | 2017 | AUG | 28.12 |
| 1400 | 2017-09-01 | 2017 | SEP | 28.11 |
| 1401 | 2017-10-01 | 2017 | OCT | 27.24 |
| 1402 | 2017-11-01 | 2017 | NOV | 23.92 |
| 1403 | 2017-12-01 | 2017 | DEC | 21.47 |
1404 rows × 4 columns
In [71]:
next_year = df[df["Year"]==2017][["Year","Month"]]
next_year["Year"] = next_year["Year"].replace(2017, 2018)
In [72]:
next_year
Out[72]:
| Year | Month | |
|---|---|---|
| 1392 | 2018 | JAN |
| 1393 | 2018 | FEB |
| 1394 | 2018 | MAR |
| 1395 | 2018 | APR |
| 1396 | 2018 | MAY |
| 1397 | 2018 | JUN |
| 1398 | 2018 | JUL |
| 1399 | 2018 | AUG |
| 1400 | 2018 | SEP |
| 1401 | 2018 | OCT |
| 1402 | 2018 | NOV |
| 1403 | 2018 | DEC |
In [73]:
test = pd.get_dummies(next_year)
In [74]:
dtr.predict(test)
Out[74]:
array([19.02, 23.08, 25.58, 29.17, 30.41, 29.44, 28.31, 28.17, 28.11,
27.24, 23.92, 21.47])
In [75]:
np.array(df[df["Year"]==2017]["Temprature"])
Out[75]:
array([20.59, 23.08, 25.58, 29.17, 30.47, 29.44, 28.31, 28.12, 28.11,
27.24, 23.92, 21.47])
In [76]:
df
Out[76]:
| Date | Year | Month | Temprature | |
|---|---|---|---|---|
| 0 | 1901-01-01 | 1901 | JAN | 17.99 |
| 1 | 1901-02-01 | 1901 | FEB | 19.43 |
| 2 | 1901-03-01 | 1901 | MAR | 23.49 |
| 3 | 1901-04-01 | 1901 | APR | 26.41 |
| 4 | 1901-05-01 | 1901 | MAY | 28.28 |
| ... | ... | ... | ... | ... |
| 1399 | 2017-08-01 | 2017 | AUG | 28.12 |
| 1400 | 2017-09-01 | 2017 | SEP | 28.11 |
| 1401 | 2017-10-01 | 2017 | OCT | 27.24 |
| 1402 | 2017-11-01 | 2017 | NOV | 23.92 |
| 1403 | 2017-12-01 | 2017 | DEC | 21.47 |
1404 rows × 4 columns
In [77]:
season
Out[77]:
| Year | Season | Avg_Temp | |
|---|---|---|---|
| 0 | 1901 | Winter | 18.790000 |
| 1 | 1902 | Winter | 19.390000 |
| 2 | 1903 | Winter | 18.800000 |
| 3 | 1904 | Winter | 18.666667 |
| 4 | 1905 | Winter | 17.966667 |
| ... | ... | ... | ... |
| 463 | 2013 | Autumn | 23.905000 |
| 464 | 2014 | Autumn | 23.955000 |
| 465 | 2015 | Autumn | 24.385000 |
| 466 | 2016 | Autumn | 25.355000 |
| 467 | 2017 | Autumn | 25.580000 |
468 rows × 3 columns
In [78]:
temp
Out[78]:
| YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | Winter | Summer | Monsoon | Autumn | Yearly_Mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | 17.99 | 19.43 | 23.49 | 26.41 | 28.28 | 28.60 | 27.49 | 26.98 | 26.26 | 25.08 | 21.73 | 18.95 | 18.790000 | 26.060000 | 27.3325 | 23.405 | 24.224167 |
| 1 | 1902 | 19.00 | 20.39 | 24.10 | 26.54 | 28.68 | 28.44 | 27.29 | 27.05 | 25.95 | 24.37 | 21.33 | 18.78 | 19.390000 | 26.440000 | 27.1825 | 22.850 | 24.326667 |
| 2 | 1903 | 18.32 | 19.79 | 22.46 | 26.03 | 27.93 | 28.41 | 28.04 | 26.63 | 26.34 | 24.57 | 20.96 | 18.29 | 18.800000 | 25.473333 | 27.3550 | 22.765 | 23.980833 |
| 3 | 1904 | 17.77 | 19.39 | 22.95 | 26.73 | 27.83 | 27.85 | 26.84 | 26.73 | 25.84 | 24.36 | 21.07 | 18.84 | 18.666667 | 25.836667 | 26.8150 | 22.715 | 23.850000 |
| 4 | 1905 | 17.40 | 17.79 | 21.78 | 24.84 | 28.32 | 28.69 | 27.67 | 27.47 | 26.29 | 26.16 | 22.07 | 18.71 | 17.966667 | 24.980000 | 27.5300 | 24.115 | 23.932500 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 112 | 2013 | 18.88 | 21.07 | 24.53 | 26.97 | 29.06 | 28.24 | 27.50 | 27.22 | 26.87 | 25.63 | 22.18 | 19.69 | 19.880000 | 26.853333 | 27.4575 | 23.905 | 24.820000 |
| 113 | 2014 | 18.81 | 20.35 | 23.34 | 26.91 | 28.45 | 29.42 | 28.07 | 27.42 | 26.61 | 25.38 | 22.53 | 19.50 | 19.553333 | 26.233333 | 27.8800 | 23.955 | 24.732500 |
| 114 | 2015 | 19.02 | 21.23 | 23.52 | 26.52 | 28.82 | 28.15 | 28.03 | 27.64 | 27.04 | 25.82 | 22.95 | 20.21 | 20.153333 | 26.286667 | 27.7150 | 24.385 | 24.912500 |
| 115 | 2016 | 20.92 | 23.58 | 26.61 | 29.56 | 30.41 | 29.70 | 28.18 | 28.17 | 27.72 | 26.81 | 23.90 | 21.89 | 22.130000 | 28.860000 | 28.4425 | 25.355 | 26.454167 |
| 116 | 2017 | 20.59 | 23.08 | 25.58 | 29.17 | 30.47 | 29.44 | 28.31 | 28.12 | 28.11 | 27.24 | 23.92 | 21.47 | 21.713333 | 28.406667 | 28.4950 | 25.580 | 26.291667 |
117 rows × 18 columns
In [79]:
temp.columns
Out[79]:
Index(['YEAR', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP',
'OCT', 'NOV', 'DEC', 'Winter', 'Summer', 'Monsoon', 'Autumn',
'Yearly_Mean'],
dtype='object')
In [80]:
d = temp[['YEAR', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP',
'OCT', 'NOV', 'DEC']]
In [81]:
temp
Out[81]:
| YEAR | JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | Winter | Summer | Monsoon | Autumn | Yearly_Mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | 17.99 | 19.43 | 23.49 | 26.41 | 28.28 | 28.60 | 27.49 | 26.98 | 26.26 | 25.08 | 21.73 | 18.95 | 18.790000 | 26.060000 | 27.3325 | 23.405 | 24.224167 |
| 1 | 1902 | 19.00 | 20.39 | 24.10 | 26.54 | 28.68 | 28.44 | 27.29 | 27.05 | 25.95 | 24.37 | 21.33 | 18.78 | 19.390000 | 26.440000 | 27.1825 | 22.850 | 24.326667 |
| 2 | 1903 | 18.32 | 19.79 | 22.46 | 26.03 | 27.93 | 28.41 | 28.04 | 26.63 | 26.34 | 24.57 | 20.96 | 18.29 | 18.800000 | 25.473333 | 27.3550 | 22.765 | 23.980833 |
| 3 | 1904 | 17.77 | 19.39 | 22.95 | 26.73 | 27.83 | 27.85 | 26.84 | 26.73 | 25.84 | 24.36 | 21.07 | 18.84 | 18.666667 | 25.836667 | 26.8150 | 22.715 | 23.850000 |
| 4 | 1905 | 17.40 | 17.79 | 21.78 | 24.84 | 28.32 | 28.69 | 27.67 | 27.47 | 26.29 | 26.16 | 22.07 | 18.71 | 17.966667 | 24.980000 | 27.5300 | 24.115 | 23.932500 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 112 | 2013 | 18.88 | 21.07 | 24.53 | 26.97 | 29.06 | 28.24 | 27.50 | 27.22 | 26.87 | 25.63 | 22.18 | 19.69 | 19.880000 | 26.853333 | 27.4575 | 23.905 | 24.820000 |
| 113 | 2014 | 18.81 | 20.35 | 23.34 | 26.91 | 28.45 | 29.42 | 28.07 | 27.42 | 26.61 | 25.38 | 22.53 | 19.50 | 19.553333 | 26.233333 | 27.8800 | 23.955 | 24.732500 |
| 114 | 2015 | 19.02 | 21.23 | 23.52 | 26.52 | 28.82 | 28.15 | 28.03 | 27.64 | 27.04 | 25.82 | 22.95 | 20.21 | 20.153333 | 26.286667 | 27.7150 | 24.385 | 24.912500 |
| 115 | 2016 | 20.92 | 23.58 | 26.61 | 29.56 | 30.41 | 29.70 | 28.18 | 28.17 | 27.72 | 26.81 | 23.90 | 21.89 | 22.130000 | 28.860000 | 28.4425 | 25.355 | 26.454167 |
| 116 | 2017 | 20.59 | 23.08 | 25.58 | 29.17 | 30.47 | 29.44 | 28.31 | 28.12 | 28.11 | 27.24 | 23.92 | 21.47 | 21.713333 | 28.406667 | 28.4950 | 25.580 | 26.291667 |
117 rows × 18 columns
In [82]:
df
Out[82]:
| Date | Year | Month | Temprature | |
|---|---|---|---|---|
| 0 | 1901-01-01 | 1901 | JAN | 17.99 |
| 1 | 1901-02-01 | 1901 | FEB | 19.43 |
| 2 | 1901-03-01 | 1901 | MAR | 23.49 |
| 3 | 1901-04-01 | 1901 | APR | 26.41 |
| 4 | 1901-05-01 | 1901 | MAY | 28.28 |
| ... | ... | ... | ... | ... |
| 1399 | 2017-08-01 | 2017 | AUG | 28.12 |
| 1400 | 2017-09-01 | 2017 | SEP | 28.11 |
| 1401 | 2017-10-01 | 2017 | OCT | 27.24 |
| 1402 | 2017-11-01 | 2017 | NOV | 23.92 |
| 1403 | 2017-12-01 | 2017 | DEC | 21.47 |
1404 rows × 4 columns
In [83]:
d = {"Month":df["Month"].unique(),"Temprature" :dtr.predict(test)}
In [84]:
d_2018 = pd.DataFrame(d)
In [85]:
d_2018
Out[85]:
| Month | Temprature | |
|---|---|---|
| 0 | JAN | 19.02 |
| 1 | FEB | 23.08 |
| 2 | MAR | 25.58 |
| 3 | APR | 29.17 |
| 4 | MAY | 30.41 |
| 5 | JUN | 29.44 |
| 6 | JUL | 28.31 |
| 7 | AUG | 28.17 |
| 8 | SEP | 28.11 |
| 9 | OCT | 27.24 |
| 10 | NOV | 23.92 |
| 11 | DEC | 21.47 |
In [86]:
final = pd.concat([df,d_2018], axis = 0)
In [87]:
final["Year"] = final["Year"].fillna(2018).astype(int)
In [88]:
final
Out[88]:
| Date | Year | Month | Temprature | |
|---|---|---|---|---|
| 0 | 1901-01-01 | 1901 | JAN | 17.99 |
| 1 | 1901-02-01 | 1901 | FEB | 19.43 |
| 2 | 1901-03-01 | 1901 | MAR | 23.49 |
| 3 | 1901-04-01 | 1901 | APR | 26.41 |
| 4 | 1901-05-01 | 1901 | MAY | 28.28 |
| ... | ... | ... | ... | ... |
| 7 | NaT | 2018 | AUG | 28.17 |
| 8 | NaT | 2018 | SEP | 28.11 |
| 9 | NaT | 2018 | OCT | 27.24 |
| 10 | NaT | 2018 | NOV | 23.92 |
| 11 | NaT | 2018 | DEC | 21.47 |
1416 rows × 4 columns
In [ ]: